Use `load`+`store` instead of `memcpy` for small integer arrays #111999

scottmcm · 2023-05-26T17:44:43Z

I was inspired by #98892 to see whether, rather than making mem::swap do something smart in the library, we could update MIR assignments like *_1 = *_2 to do something smarter than memcpy for sufficiently-small types that doing it inline is going to be better than a memcpy call in assembly anyway. After all, special code may help mem::swap, but if the "obvious" MIR can just result in the correct thing that helps everything -- other code like mem::replace, people doing it manually, and just passing around by value in general -- as well as makes MIR inlining happier since it doesn't need to deal with all the complicated library code if it just sees a couple assignments.

LLVM will turn the short, known-length memcpys into direct instructions in the backend, but that's too late for it to be able to remove allocas. In general, replacing memcpys with typed instructions is hard in the middle-end -- even for memcpy.inline where it knows it won't be a function call -- is hard due to poison propagation issues. So because we know more about the type invariants -- these are typed copies -- rustc can emit something more specific, allowing LLVM to mem2reg away the allocas in some situations.

#52051 previously did something like this in the library for mem::swap, but it ended up regressing during enabling mir inlining (cbbf06b), so this has been suboptimal on stable for ≈5 releases now.

The code in this PR is narrowly targeted at just integer arrays in LLVM, but works via a new method on the LayoutTypeMethods trait, so specific backends based on cg_ssa can enable this for more situations over time, as we find them. I don't want to try to bite off too much in this PR, though. (Transparent newtypes and simple things like the 3×usize String would be obvious candidates for a follow-up.)

Codegen demonstrations: https://llvm.godbolt.org/z/fK8hT9aqv

Before:

define void @swap_rgb48_old(ptr noalias nocapture noundef align 2 dereferenceable(6) %x, ptr noalias nocapture noundef align 2 dereferenceable(6) %y) unnamed_addr #1 {
  %a.i = alloca [3 x i16], align 2
  call void @llvm.lifetime.start.p0(i64 6, ptr nonnull %a.i)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 2 dereferenceable(6) %a.i, ptr noundef nonnull align 2 dereferenceable(6) %x, i64 6, i1 false)
  tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 2 dereferenceable(6) %x, ptr noundef nonnull align 2 dereferenceable(6) %y, i64 6, i1 false)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 2 dereferenceable(6) %y, ptr noundef nonnull align 2 dereferenceable(6) %a.i, i64 6, i1 false)
  call void @llvm.lifetime.end.p0(i64 6, ptr nonnull %a.i)
  ret void
}

Note it going to stack:

swap_rgb48_old:                         # @swap_rgb48_old
        movzx   eax, word ptr [rdi + 4]
        mov     word ptr [rsp - 4], ax
        mov     eax, dword ptr [rdi]
        mov     dword ptr [rsp - 8], eax
        movzx   eax, word ptr [rsi + 4]
        mov     word ptr [rdi + 4], ax
        mov     eax, dword ptr [rsi]
        mov     dword ptr [rdi], eax
        movzx   eax, word ptr [rsp - 4]
        mov     word ptr [rsi + 4], ax
        mov     eax, dword ptr [rsp - 8]
        mov     dword ptr [rsi], eax
        ret

Now:

define void @swap_rgb48(ptr noalias nocapture noundef align 2 dereferenceable(6) %x, ptr noalias nocapture noundef align 2 dereferenceable(6) %y) unnamed_addr #0 {
start:
  %0 = load <3 x i16>, ptr %x, align 2
  %1 = load <3 x i16>, ptr %y, align 2
  store <3 x i16> %1, ptr %x, align 2
  store <3 x i16> %0, ptr %y, align 2
  ret void
}

still lowers to dword+word operations, but has no stack traffic:

swap_rgb48:                             # @swap_rgb48
        mov     eax, dword ptr [rdi]
        movzx   ecx, word ptr [rdi + 4]
        movzx   edx, word ptr [rsi + 4]
        mov     r8d, dword ptr [rsi]
        mov     dword ptr [rdi], r8d
        mov     word ptr [rdi + 4], dx
        mov     word ptr [rsi + 4], cx
        mov     dword ptr [rsi], eax
        ret

And as a demonstration that this isn't just mem::swap, a mem::replace on a small array (since replace doesn't use swap since #83022), which used to be memcpys in LLVM changes in IR

define void @replace_short_array(ptr noalias nocapture noundef sret([3 x i32]) dereferenceable(12) %0, ptr noalias noundef align 4 dereferenceable(12) %r, ptr noalias nocapture noundef readonly dereferenceable(12) %v) unnamed_addr #0 {
start:
  %1 = load <3 x i32>, ptr %r, align 4
  store <3 x i32> %1, ptr %0, align 4
  %2 = load <3 x i32>, ptr %v, align 4
  store <3 x i32> %2, ptr %r, align 4
  ret void
}

but that lowers to reasonable dword+qword instructions still

replace_short_array:                    # @replace_short_array
        mov     rax, rdi
        mov     rcx, qword ptr [rsi]
        mov     edi, dword ptr [rsi + 8]
        mov     dword ptr [rax + 8], edi
        mov     qword ptr [rax], rcx
        mov     rcx, qword ptr [rdx]
        mov     edx, dword ptr [rdx + 8]
        mov     dword ptr [rsi + 8], edx
        mov     qword ptr [rsi], rcx
        ret

rustbot · 2023-05-26T17:44:49Z

r? @b-naber

(rustbot has picked a reviewer for you, use r? to override)

scottmcm · 2023-05-26T18:44:40Z

@bors try @rust-timer queue

bors · 2023-05-26T18:44:50Z

⌛ Trying commit 44c45ca0492db8f40cdcb19ff7dbfca05e95be75 with merge daa8f4a9835d98cf1139c088d10b29b4e8e084a5...

the8472 · 2023-05-26T20:32:17Z

since replace doesn't use swap since #83226)

Wrong link?

scottmcm · 2023-05-26T20:34:39Z

@the8472 Oops, yup, wrong link. Should be #83022

bors · 2023-05-26T20:55:33Z

☀️ Try build successful - checks-actions
Build commit: daa8f4a9835d98cf1139c088d10b29b4e8e084a5 (daa8f4a9835d98cf1139c088d10b29b4e8e084a5)

bors · 2023-05-26T20:55:33Z

☀️ Try build successful - checks-actions
Build commit: daa8f4a9835d98cf1139c088d10b29b4e8e084a5 (daa8f4a9835d98cf1139c088d10b29b4e8e084a5)

rust-timer · 2023-05-26T22:37:28Z

Finished benchmarking commit (daa8f4a9835d98cf1139c088d10b29b4e8e084a5): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.7%	[0.7%, 0.7%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.1%	[3.1%, 3.1%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.4%	[-1.9%, -1.0%]	2
Improvements ✅ (secondary)	-2.2%	[-2.7%, -1.5%]	3
All ❌✅ (primary)	0.1%	[-1.9%, 3.1%]	3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.7%	[1.2%, 2.1%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.0%	[0.0%, 0.0%]	4
Regressions ❌ (secondary)	0.7%	[0.3%, 2.9%]	8
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.0%	[0.0%, 0.0%]	4

Bootstrap: 643.928s -> 645.194s (0.20%)

b-naber · 2023-05-28T22:10:00Z

r? compiler

compiler-errors · 2023-05-29T05:02:11Z

compiler/rustc_codegen_ssa/src/base.rs

+    if flags == MemFlags::empty()
+        && let Some(bty) = bx.cx().scalar_copy_backend_type(layout)
+    {
+        // I look forward to only supporting opaque pointers


Maybe a FIXME so someone may remove this when llvm only has opaque ptrs? (Well, I guess removing this logic would also preclude other backends with typed ptrs, too. In that case, maybe no comment at all.)

Yeah, it's a bigger question because GCC would need to move, so I'll leave it tracked by conversations like https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/llvm.20bitcasts.20in.20codegen/near/356591425 instead of something specific in this place.

compiler-errors · 2023-05-29T05:03:39Z

compiler/rustc_codegen_ssa/src/traits/type_.rs

+    ///
+    /// This *should* return `None` for particularly-large types, where leaving
+    /// the `memcpy` may well be important to avoid code size explosion.
+    fn scalar_copy_backend_type(&self, layout: TyAndLayout<'tcx>) -> Option<Self::Type> {


Having a default impl here may make it not obvious that other backends should emit typed load/store. Is it bad style to just add this to cranelift and codegen_gcc here too?

Well, cg_clif doesn't use cg_ssa, so it's not impacted.

I have no practical way to test cg_gcc, being on windows, and I also don't actually have any information that doing something other than memcpy for it would actually be an improvement for it, so I figured I'd just leave the default here since it's a semantically sufficient implementation.

Notably, GCC apparently doesn't have the poison semantics that are what nikic mentioned as being the problem for better optimizing this:

rust/compiler/rustc_codegen_gcc/src/common.rs

Lines 76 to 79 in f8447b9

fn const_poison(&self, typ: Type<'gcc>) -> RValue<'gcc> {

// No distinction between undef and poison.

self.const_undef(typ)

}

so indeed it might just never need to do this.

compiler-errors · 2023-05-29T05:07:42Z

compiler/rustc_codegen_ssa/src/base.rs

+        let src = bx.pointercast(src, pty);
+        let dst = bx.pointercast(dst, pty);


When do these not match pty? Why not just return a more "accurate" type in scalar_copy_llvm_type? Like the actual LLVM type that corresponds to src/dst? (Or am I misunderstanding something?)

(Well, the shallow answer is that they don't match because it's -> Option<Type>, not should_load_store_instead_of_memcpy() -> bool.)

Because emitting these as the backend type doesn't necessarily do what we want. Notably, this PR is emitting the load/store as <4 x i8>, the vector type, rather than the llvm_backend_type of [4 x i8] for arrays.

I tried using the LLVM type first, but with arrays that results in exploding out the IR: https://llvm.godbolt.org/z/vjjsdea9e

Optimizing the version that loads/stores arrays

define void @replace_short_array_using_arrays(ptr noalias nocapture noundef sret([3 x i32]) dereferenceable(12) %0, ptr noalias noundef align 4 dereferenceable(12) %r, ptr noalias nocapture noundef readonly dereferenceable(12) %v) unnamed_addr #0 { start: %1 = load [3 x i32], ptr %r, align 4 store [3 x i32] %1, ptr %0, align 4 %2 = load [3 x i32], ptr %v, align 4 store [3 x i32] %2, ptr %r, align 4 ret void }

gives

define void @replace_short_array_using_arrays(ptr noalias nocapture noundef writeonly sret([3 x i32]) dereferenceable(12) %0, ptr noalias nocapture noundef align 4 dereferenceable(12) %r, ptr noalias nocapture noundef readonly dereferenceable(12) %v) unnamed_addr #0 { %.unpack = load i32, ptr %r, align 4 %.elt1 = getelementptr inbounds [3 x i32], ptr %r, i64 0, i64 1 %.unpack2 = load i32, ptr %.elt1, align 4 %.elt3 = getelementptr inbounds [3 x i32], ptr %r, i64 0, i64 2 %.unpack4 = load i32, ptr %.elt3, align 4 store i32 %.unpack, ptr %0, align 4 %.repack5 = getelementptr inbounds [3 x i32], ptr %0, i64 0, i64 1 store i32 %.unpack2, ptr %.repack5, align 4 %.repack7 = getelementptr inbounds [3 x i32], ptr %0, i64 0, i64 2 store i32 %.unpack4, ptr %.repack7, align 4 %.unpack9 = load i32, ptr %v, align 4 %.elt10 = getelementptr inbounds [3 x i32], ptr %v, i64 0, i64 1 %.unpack11 = load i32, ptr %.elt10, align 4 %.elt12 = getelementptr inbounds [3 x i32], ptr %v, i64 0, i64 2 %.unpack13 = load i32, ptr %.elt12, align 4 store i32 %.unpack9, ptr %r, align 4 store i32 %.unpack11, ptr %.elt1, align 4 store i32 %.unpack13, ptr %.elt3, align 4 ret void }

whereas optimizing the version with vectors leaves the operations together

define void @replace_short_array_using_vectors(ptr noalias nocapture noundef writeonly sret([3 x i32]) dereferenceable(12) %0, ptr noalias nocapture noundef align 4 dereferenceable(12) %r, ptr noalias nocapture noundef readonly dereferenceable(12) %v) unnamed_addr #0 { %1 = load <3 x i32>, ptr %r, align 4 store <3 x i32> %1, ptr %0, align 4 %2 = load <3 x i32>, ptr %v, align 4 store <3 x i32> %2, ptr %r, align 4 ret void }

for the backend to legalize instead.

Oh, and I originally expected to do this with LLVM's arbitrary-length integer support https://llvm.godbolt.org/z/hzG9aeqhM, like I did for comparisons back in #85828

define void @replace_short_array_using_long_integer(ptr noalias nocapture noundef sret([3 x i32]) dereferenceable(12) %0, ptr noalias noundef align 4 dereferenceable(12) %r, ptr noalias nocapture noundef readonly dereferenceable(12) %v) unnamed_addr #0 { start: %1 = load i96, ptr %r, align 4 store i96 %1, ptr %0, align 4 %2 = load i96, ptr %v, align 4 store i96 %2, ptr %r, align 4 ret void }

But that broke some of the autovectorization codegen tests for operations on arrays.

Yeah ok, so we explicitly want to be lowering these moves to vector types specifically because they can optimize better than arrays.

compiler/rustc_codegen_llvm/src/type_of.rs

tests/codegen/array-map.rs

To confirm we're not just helping `mem::swap`

rust-timer · 2023-06-06T05:58:56Z

Finished benchmarking commit (fd9bf59): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.2%	[0.1%, 5.7%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	4
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.0%	[1.1%, 4.8%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.4%	[2.4%, 2.4%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.0%, 0.5%]	12
Regressions ❌ (secondary)	0.6%	[0.2%, 2.9%]	68
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[0.0%, 0.5%]	12

Bootstrap: 647.471s -> 646.658s (-0.13%)

pnkfelix · 2023-06-13T20:49:44Z

changes are all to secondary benchmarks
only notable was coercions debug incr-full regressing by 5.7%, but that's acceptable given what we expect benefits to be here w.r.t. codegen.
marking as triaged.

@rustbot label: +perf-regression-triaged

tpelkone · 2023-08-25T11:36:25Z

In noticed a quite large performance (15-20%) regression in an analytical database I'm building with [u8; 24] and [u8; 28] arrays. I bisected the regression between nightly commits e6d4725 and b2b34bd. This PR was the obvious suspicion since this changed how arrays that small are copied around. There's a lot of small arrays being moved around in the database I'm building. I didn't notice any performance regressions for other data types (u128 or u64 etc.) I've only tested on aarch64-apple-darwin so far. I haven't published my work yet, but I can come up with a benchmark program if needed.

lqd · 2023-08-25T12:29:45Z

(In case you don’t already know about it: cargo-bisect-rustc should bisect within the rollup PR of that last commit, to hopefully give you the offending PR or in/validate it’s this one)

tpelkone · 2023-08-25T12:57:47Z

I'll verify the offending PR using cargo-bisect-rustc. In the meantime here's a simple benchmark which reproduces the issue on my Macbook M1 Pro:

#![feature(test)]
extern crate test;
use test::Bencher;

#[bench]
fn bench(b: &mut Bencher) {
    b.iter(|| {
        let data = vec![[1u8; 24]; 1024 * 1024];
        let mut new_vec = Vec::with_capacity(1024 * 1024);
        for entry in data {
            new_vec.push(entry);
        }
    });
}

$ cargo +nightly-2023-06-06-aarch64-apple-darwin bench
    Finished bench [optimized] target(s) in 0.00s
     Running unittests src/main.rs (target/release/deps/benchmarks-0c0dfc3834e16b96)

running 1 test
test bench ... bench:   5,899,666 ns/iter (+/- 137,707)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 0 filtered out; finished in 5.34s

$ cargo +nightly-2023-06-07-aarch64-apple-darwin bench
   Compiling benchmarks v0.1.0 (/Users/tuomas/rust/benchmarks)
    Finished bench [optimized] target(s) in 0.18s
     Running unittests src/main.rs (target/release/deps/benchmarks-0c0dfc3834e16b96)

running 1 test
test bench ... bench:   9,094,616 ns/iter (+/- 46,638)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 0 filtered out; finished in 8.23s

the8472 · 2023-08-25T13:14:23Z

Please open an issue so we can track this. Closed PRs aren't a good place for it.

tpelkone · 2023-08-25T13:15:54Z

Will do.

Remove my `scalar_copy_backend_type` optimization attempt I added this back in rust-lang#111999 , but I no longer think it's a good idea - It had to get scaled back to only power-of-two things to not break a bunch of targets - LLVM seems to be getting better at memcpy removal anyway - Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.

…-errors Remove my `scalar_copy_backend_type` optimization attempt I added this back in rust-lang#111999 , but I no longer think it's a good idea - It had to get scaled back to only power-of-two things to not break a bunch of targets - LLVM seems to be getting better at memcpy removal anyway - Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.

rustbot assigned b-naber May 26, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 26, 2023

This comment has been minimized.

Sign in to view

scottmcm force-pushed the codegen-less-memcpy branch from 1310aed to 44c45ca Compare May 26, 2023 17:57

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 26, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 26, 2023

rustbot assigned compiler-errors and unassigned b-naber May 28, 2023

compiler-errors reviewed May 29, 2023

View reviewed changes

the8472 reviewed May 29, 2023

View reviewed changes

compiler/rustc_codegen_llvm/src/type_of.rs Show resolved Hide resolved

scottmcm commented May 31, 2023

View reviewed changes

tests/codegen/array-map.rs Outdated Show resolved Hide resolved

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 31, 2023

scottmcm added 2 commits June 4, 2023 00:50

Add a codegen test for manually swapping a small Copy type

cce0b52

To confirm we're not just helping `mem::swap`

Use load-store instead of memcpy for short integer arrays

e1b020d

scottmcm force-pushed the codegen-less-memcpy branch from 44c45ca to e1b020d Compare June 4, 2023 07:56

rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jun 4, 2023

bors merged commit fd9bf59 into rust-lang:master Jun 6, 2023

rustbot added this to the 1.72.0 milestone Jun 6, 2023

scottmcm deleted the codegen-less-memcpy branch June 6, 2023 04:44

rustbot added the perf-regression Performance regression. label Jun 6, 2023

rustbot added the perf-regression-triaged The performance regression has been triaged. label Jun 13, 2023

scottmcm mentioned this pull request Jun 17, 2023

Avoid memcpy in codegen for more types, notably Vec #112733

Closed

TaKO8Ki mentioned this pull request Jun 27, 2023

Rollup of 18 pull requests #113097

Closed

This was referenced Jul 3, 2023

spurious crash when compiling cargo on dist-mips64{el}-linux #113065

Closed

Miscompilation corrupts stack-allocated vectors llvm/llvm-project#63475

Closed

lenawanel mentioned this pull request Aug 8, 2023

Crash when trying to compile i32x4 type with VE backend llvm/llvm-project#64420

Closed

tpelkone mentioned this pull request Aug 25, 2023

Nearly 50% performance regression when copying/moving small byte arrays between 1.71 and 1.72 #115212

Closed

This was referenced Aug 26, 2023

Stop emitting non-power-of-two vectors in (non-portable-SIMD) codegen #115236

Merged

lack of documentation on the issue that using std::io::Write::write_all with a std::net::TcpStream results in undefined behavior when non_blocking(true) #115451

Closed

scottmcm mentioned this pull request Sep 24, 2023

don't use read_unaligned to read from an array casted to *const Simd rust-lang/portable-simd#341

Open

scottmcm mentioned this pull request Oct 6, 2023

optimize zipping over array iterators #115515

Merged

aldanor mentioned this pull request Oct 15, 2023

"".split('.') in build.rs with -Cremark=all causes compiler segfault on ARM #116729

Open

scottmcm mentioned this pull request Mar 29, 2024

Remove my scalar_copy_backend_type optimization attempt #123185

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `load`+`store` instead of `memcpy` for small integer arrays #111999

Use `load`+`store` instead of `memcpy` for small integer arrays #111999

scottmcm commented May 26, 2023 •

edited

Loading

rustbot commented May 26, 2023

This comment has been minimized.

scottmcm commented May 26, 2023

This comment has been minimized.

bors commented May 26, 2023

the8472 commented May 26, 2023

scottmcm commented May 26, 2023

bors commented May 26, 2023

bors commented May 26, 2023

This comment has been minimized.

rust-timer commented May 26, 2023

b-naber commented May 28, 2023

compiler-errors May 29, 2023

scottmcm May 29, 2023

compiler-errors May 29, 2023

scottmcm May 29, 2023

compiler-errors May 29, 2023

scottmcm May 29, 2023 •

edited

Loading

scottmcm May 29, 2023

compiler-errors Jun 6, 2023

rust-timer commented Jun 6, 2023

pnkfelix commented Jun 13, 2023

tpelkone commented Aug 25, 2023

lqd commented Aug 25, 2023

tpelkone commented Aug 25, 2023

the8472 commented Aug 25, 2023

tpelkone commented Aug 25, 2023

	fn const_poison(&self, typ: Type<'gcc>) -> RValue<'gcc> {
	// No distinction between undef and poison.
	self.const_undef(typ)
	}

		let src = bx.pointercast(src, pty);
		let dst = bx.pointercast(dst, pty);

Use load+store instead of memcpy for small integer arrays #111999

Use load+store instead of memcpy for small integer arrays #111999

Conversation

scottmcm commented May 26, 2023 • edited Loading

rustbot commented May 26, 2023

This comment has been minimized.

scottmcm commented May 26, 2023

This comment has been minimized.

bors commented May 26, 2023

the8472 commented May 26, 2023

scottmcm commented May 26, 2023

bors commented May 26, 2023

bors commented May 26, 2023

This comment has been minimized.

rust-timer commented May 26, 2023

Overall result: ❌ regressions - no action needed

b-naber commented May 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm May 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rust-timer commented Jun 6, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

pnkfelix commented Jun 13, 2023

tpelkone commented Aug 25, 2023

lqd commented Aug 25, 2023

tpelkone commented Aug 25, 2023

the8472 commented Aug 25, 2023

tpelkone commented Aug 25, 2023

Use `load`+`store` instead of `memcpy` for small integer arrays #111999

Use `load`+`store` instead of `memcpy` for small integer arrays #111999

scottmcm commented May 26, 2023 •

edited

Loading

scottmcm May 29, 2023 •

edited

Loading